ConvertFromTextToUnicode

Inside Macintosh: Programming With the Text Encoding Conversion Manager /

Chapter 4 - Unicode Converter Reference / Unicode Converter Functions

ConvertFromTextToUnicode
Converts a string from any encoding to Unicode.
pascal OSStatus ConvertFromTextToUnicode (
                     TextToUnicodeInfo iTextToUnicodeInfo, 
                     ByteCount iSourceLen, 
                     ConstLogicalAddress iSourceStr, 
                     OptionsBits iControlFlags, 
                     ItemCount iOffsetCount, 
                     ByteOffset iOffsetArray[], 
                     ItemCount *oOffsetCount, 
                     ByteOffset oOffsetArray[], 
                     ByteCount iOutputBufLen, 
                     ByteCount *oSourceRead, 
                     ByteCount *oUnicodeLen, 
                     UniCharArrayPtr oUnicodeStr);
iTextToUnicodeInfo
A Unicode converter object of type TextToUnicodeInfo containing mapping and state information used for the conversion. Your application obtains a Unicode converter object using the function CreateTextToUnicodeInfo (page 125).
iSourceLen
The length in bytes of the source string to be converted.
iSourceStr
The address of the source string to be converted.
iControlFlags
Conversion control flags. You can use these bitmasks to set the control flags that apply to this function:
kUnicodeUseFallbacksMask
kUnicodeLooseMappingsMask
kUnicodeKeepInfoMask
kUnicodeStringUnterminatedMask
kUnicodeForceASCIIRangeMask
kUnicodeNoHalfwidthCharsMask
See "Conversion Control Flags" (page 110).
iOffsetCount
The number of offsets in the iOffsetArray parameter. Your application supplies this value. The number of entries in iOffsetArray must be fewer than the number of bytes specified in iSourceLen. If you don't want offsets returned to you, specify 0 (zero) for this parameter.
iOffsetArray
An array of type ByteOffset. On input, you specify the array that contains an ordered list of significant byte offsets pertaining to the source string. These offsets may identify font or style changes, for example, in the source string. All array entries must be less than the length in bytes specified by the iSourceLen parameter. If you don't want offsets returned to your application, specify NULL for this parameter and 0 (zero) for iOffsetCount.
oOffsetCount
A pointer to a value of type ItemCount. On output, this value contains the number of offsets that were mapped in the output stream.
oOffsetArray
An array of type ByteOffset. On output, this array contains the corresponding new offsets for the Unicode string produced by the converter.
iOutputBufLen
The length in bytes of the output buffer pointed to by the oUnicodeStr parameter. Your application supplies this buffer to hold the returned converted string. The oUnicodeLen parameter may return a byte count that is less than this value if the converted byte string is smaller than the buffer size you allocated. The relationship between the size of the source string and the Unicode string is complex and depends on the source encoding and the contents of the string.
oSourceRead
A pointer to a value of type ByteCount. On output, this value contains the number of bytes of the source string that were converted. If the function returns a kTECUnmappableElementErr result code, this parameter returns the number of bytes that were converted before the error occurred.
oUnicodeLen
A pointer to a value of type ByteCount. On output, this value contains the length in bytes of the converted stream.
oUnicodeStr
A pointer to an array used to hold a Unicode string. On input, this value points to the beginning of the array for the converted string. On output, this buffer holds the converted Unicode string. (For guidelines on estimating the size of the buffer needed, see the following discussion.) For a description of the UniCharArrayPtr data type, see Chapter 2, "Basic Text Types Reference."
function result
A result code. See "Text Encoding Conversion Manager Result Codes" (page 42) in the chapter "Basic Text Types Reference."
DISCUSSION
The ConvertFromTextToUnicode function converts a text string in a non-Unicode encoding to Unicode. You specify the source string's encoding in the Unicode mapping structure that you pass to the function CreateTextToUnicodeInfo (page 125) to obtain a Unicode converter object for the conversion. You pass the Unicode converter object returned by CreateTextToUnicodeInfo to ConvertFromTextToUnicode as the iTextToUnicodeInfo parameter.
In addition to converting a text string in any encoding to Unicode, the ConvertFromTextToUnicode function can map offsets for style or font information from the source text string to the returned converted string. The converter reads the application-supplied offsets, which apply to the source string, and returns the corresponding new offsets in the converted string. If you do not want the offsets at which font or style information occurs mapped to the resulting string, you should pass NULL for iOffsetArray and 0 (zero) for iOffsetCount.
Your application must allocate a buffer to hold the resulting converted string and pass a pointer to the buffer in the oUnicodeStr parameter. To determine the size of the output buffer to allocate, you should consider the size of the source string, its encoding type, and its content in relation to the resulting Unicode string.
For example, for 1-byte encodings, such as MacRoman, the Unicode string will be at least double the size (more if it uses noncomposed Unicode); for MacArabic and MacHebrew, the corresponding Unicode string could be up to six times as big. For most 2-byte encodings, for example Shift-JIS, the Unicode string will be less than double the size. For international robustness, your application should allocate a buffer three to four times larger than the source string. If the output Unicode text is actually UTF-8--which could occur beginning with the current release of the Text Encoding Conversion Manager, version 1.2.1--the UTF-8 buffer pointer must be cast to UniCharArrayPtr before it can be passed as the oUnicodeStr parameter. Also, the output buffer length will have a wider range of variation than for UTF-16; for ASCII input, the output will be the same size; for Han input, the output will be twice as big, and so on.
The function returns a noErr result code if it has completely converted the input string to Unicode without using fallback characters. If the function returns the paramErr, kTECTableFormatErr, or kTECGlobalsUnavailableErr result codes, it did not convert the string.
If the function returns kTECBufferBelowMinimumSizeErr, the output buffer was too small to allow conversion of any part of the input string. You need to increase the size of the output buffer and try again.
If the function returns the kTECUsedFallbacksStatus result code, the function has completely converted the string using one or more fallback characters. This can only happen if you set the Unicode-use-fallbacks control flag.
If the function returns kTECOutputBufferFullErr, the output buffer was not big enough to completely convert the input; oSourceRead indicates the amount of input converted. You can call the function again with another output buffer--or with the same output buffer, after copying its contents--to convert the remainder of the input string.
If the function returns kTECPartialCharErr, the input buffer ended with an incomplete multibyte character. If you have subsequent input text available, you can append the unconverted input from this call to the beginning of the subsequent input text and call the function again.
If the function returns kTECUnmappableElementErr because an input text element could not be mapped to Unicode, then the function did not completely convert the input string. This can only happen if you did not set the Unicode-use-fallbacks control flag. You can set this flag and then convert the remaining unconverted input, or take some other action.

SPECIAL CONSIDERATIONS
This function modifies the contents of the Unicode converter object you pass in the iTextToUnicodeInfo parameter.